Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset

نویسندگان

  • Jan Hajic
  • Barbora Hladká
چکیده

p u r p o s e s , i t h a s b e e n t a g g e d b y o u r t a g g e r ; e r r o r s a r e p r i n t e d u n d e r l i n e d a n d c o r r e c t i o n s a r e s h o w n . } Hlavnfm/AAIS7 .... IA-probl4mem/NNIS7 ..... A--

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Morphological Tagging: Data vs. Dictionaries

Part of Speech tagging for English seems to have reached the the human levels of error, but full morphological tagging for inflectionally rich languages, such as Romanian, Czech, or Hungarian, is still an open problem, and the results are far from being satisfactory. This paper presents results obtained by using a universalized exponential feature-based model for five such languages. It focuses...

متن کامل

The Linguistics Journal Volume 4 Issue 1 the First Paper on " Part-of-speech Tagging for Grammar Checking of Punjabi " Part-of-speech Tagging for Grammar Checking of Punjabi Noun and Modifier Agreement

Part-of-speech (POS) tagging is one of the major activities performed in a typical natural language processing application. This paper explores part-of-speech tagging for the Punjabi language, a member of the Modern Indo-Aryan family of languages. A tagset for use in grammar checking and other similar applications is proposed. This fine-grained tagset is based entirely on the grammatical catego...

متن کامل

Tigrinya Part-of-Speech Tagging with Morphological Patterns and the New Nagaoka Tigrinya CorpusTigrinya Part-of-Speech Tagging with Morphological Patterns and the New Nagaoka Tigrinya Corpus

This paper presents the first part-of-speech (POS) tagging research for Tigrinya (Semitic language) from the newly constructed Nagaoka Tigrinya Corpus. The raw text was extracted from a newspaper published in Eritrea in the Tigrinya language. This initial corpus was cleaned and formatted in plaintext and the Text Encoding Initiative (TEI) XML format. A tagset of 73 tags was designed, and the co...

متن کامل

Automatic Morphological Analysis for Russian: a Comparative Study

In this paper we present a comparison of ten systems for automatic morphological analysis: TreeTagger, TnT, HunPos, Lapos, Citar, Morfette, Mystem, Pymorhy, Stanford POS tagger and SVMTool. Different training and tagging approaches are discussed together with the strengths and weaknesses of each system. Probabilistic taggers were trained and tested on the Russian National Disambiguated Corpus a...

متن کامل

A Positional Tagset for Russian

Fusional languages have rich inflection. As a consequence, tagsets capturing their morphological features are necessarily large. A natural way to make a tagset manageable is to use a structured system. In this paper, we present a positional tagset for describing morphological properties of Russian. The tagset was inspired by the Czech positional system (Hajič, 2004). We have used preliminary ve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998